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ABSTRACT 

Tn¥ nucleotide sequence of the ompF gene coding for a major outer 
membrane protein of Escherichia coli K-12 has been determined and the amino 
acid sequence of the OmpF protein was deduced from it. The OmpF protein 
contains 340 amino acid residues, and is produced from a precursor having 22 
extra amino acid residues, the signal peptide, at the amino terminus. The 
expected secondary structure of the OmpF protein had a high j8-sheet content 
with a low OL -helix content. The promoter region and the transcription ter- 
mination region of the ompF gene had a significantly high AT content, while 
the AT content of the coding region was about the same as the average AT 
content of the E. coli chromosome. Following the termination codon, a typical 
f-independent transcription termination signal was observed. The codon usage 
in the ompF gene was highly non random; the codons preferably utilized are 
those recognized by the most abundant species of isoaccepting tRNAs or those, 
among synonymouns codons recognized by the same tRNA, that can interact more 
properly with the anticodon. 



INTRODUCTION 

The outer membrane of Escherichia coli K-12 usually contains four major 
proteins, the OmpF protein, the OmpC protein, the OmpA protein and Braun's 
lipoprotein, and these are the most abundant proteins in the cell (1). These 
proteins were first synthesized in a precursor form having a signal peptide 
at the NH 2 -terminus (2, 3, 4). Two of these proteins, OmpF and OmpC, resemble 
each other with respect to the apparent molecular weight (5), total amino 
acid composition and NH 2 -terminal sequence (6), strong association with the 
peptidoglycan layer (7), extremely high contents of |3-structured polypeptide 
(5), trimeric structures (8), and pore functions for small hydrophilic mole- 
cules (9). Four genetic loci ( ompF , ompC , ompR and envZ ) are known to be 
involved in the synthesis of these two proteins. The ompF and ompC loci 
represent structural genes for the corresponding proteins, respectively (6, 
10, 11), and are suggested to be derived from a single ancestral gene (6). 
On the other hand, gene products of ompR and envZ are shown to positively 
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regulate the synthesis of these proteins (12). The ompR-envZ region was 
formerly called ompB . 

Despite the similarities described above, the biosynthesis of the two 
proteins is regulated differentially. The envZ mutation results in the lack 
of the OmpF protein, while mutations in the ompR gene cause either the lack 
of OmpC alone or both proteins (12). In addition, the synthesis of the two 
proteins is affected in opposite directions by high concentrations of 
substances like sucrose and NaCl in culture media (13, 14). Upon addition of 
these substances to culture media, protein synthesis is immediately switched 
from OmpF to OmpC. It is suggested that recognition by cells of the osmotic 
difference between the outside and inside of the outer membrane is involved 
in this switching phenomenon (14). 

In order to study the molecular mechanism controlling the synthesis of 
these outer membrane proteins, we isolated a specialized transducing lambda 
phage that carries the ompF gene of F_. coli K-12 (15). In a previous study 
(4), we determined the DNA sequence that covers the NH 2 -terminal region of 
OmpF and deduced the amino acid sequence of the signal peptide. Here we 
present the entire DNA sequence of the ompF gene. 

MATERIALS AND METHODS 
Enzymes and Chemicals . 

Restriction endonucleases Rsa^ I, Psjt I, Alu. I, Tag, I and Pvu_ II were 
obtained from New England Biolabs. Other restriction endonucleases were from 
Takara Shuzo Co. Bacterial alkaline phosphatase was from Worthington Bio- 
chemical Corp., bacteriophage T4 polynucleotide kinase from P.L.Biochemicals, 
and T4 ligase from Takara Shuzo Co. [J- 32 P]ATP (9,000 Ci/mmol) was prepared 
from carrier- free [ 32 P]orthophosphoric acid (Amersham Intl.) and ADP (Sigma 
Chemicals) by the method of Johnson and Walseth (16). 
Bacterial Strains and Bacteriophages . 

The following strains derived from E_. coli K-12 and phages were used: 
KY2562 (thi_tsx malA ompBIOl ) (17); H0202ma1* (F~ thi rel asnS t s , a mal + 
derivative of H0202)(4); \ ompF! (a specialized transducing \ phage carrying 
the asnS-ompF region of the E_. coli chromosome) (15); and XcI857Sam7. 
Preparation of \.ompFl DNA and Subcloning into pBR322 . 

The methods used for the propagation of the "X_ ompFl were those described 
by Schrenk and Weisberg (18). Strain H0202mal + grown in L-broth at 30°C was 
used as the host strain and XcI857Sam7 as the helper phage. The \ ompFl 
DNA was extracted with phenol as described (19) and recovered by ethanol pre- 
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cipitation. After digestion with Sal^ I. the 14-kb DNA fragment shown in 
Fig. 1 was prepared using agarose gel and ligated with cloning vector pBR322 
as described (20). Ligation mixtures were used directly for transformation 
using the procedure of Dagert and Ehrlich (21). 
Purification of Plasmid DNA . 

Strain KY2562 harboring a pBR322-derived plasmid was grown until A g60 = 
1.3 at 37°C in L-broth supplemented with glucose (1 g/1) and ampicillin (20 
mg/1). Chloramphenicol (100 mg/1) was then added and the cells were harvest- 
ed 18 h later. Purification of plasmid DNA was carried out as described by 
Matsubara (22). 

Gel Electrophoresis of DNA Fragments . 

Gel electrophoresis was carried out for both analytical and preparative 
purposes. Polyacrylamide gel (5%) was used for the separation of DNA fragments 
smaller than 1,500 bp and 0.82 agarose gel was used for fragments larger than 
1,500 bp. The buffer for electrophoresis contained 50 mM Tris-borate (pH 8.3) 
and 1 mM EDTA. DNA fragments were eluted from the polyacrylamide gel by the 
crush and soak technique (23) and from the agarose gel by the freeze-squeeze 
technique (24), extracted with phenol and precipitated with ethanol. 
Restriction Endonuclease Happing and DNA Sequencing . 

A restriction enzyme cleavage map was constructed using both the end- 
labeling method (25) and the double digestion method (26). All DNA sequenc- 
ing methods were according to Maxam and Gilbert (23). Single end-labeled 
fragments obtained after the chemical cleavages were analyzed by means of a 
thin sequencing gel system (0.04 x 20 x 40cm) with 20% and 10% polyacrylamide 
in 7 M urea. 

RESULTS 

Cloning of the ompF Gene on Plasmid Vector pBR322 . 

We constructed various hybrid plasmids carrying the ompF region of 
\ ompFl as shown in Fig. 1. The 14-kb Sal 1 °NA fragment from X. ompF! was 
inserted into the Sal^ I site of pBR322. The ligated mixture was used to 
transform KY2562 selecting for ampicillin resistant and tetracycline sensi- 
tive. Two types of 18.3-kb plasmids having the 14-kb insert at the Sa]_ I 
site of pBR322 with opposite orientation to each other were found in the 
resulting transformants. Representative plasmids were designated as pLF2 and 
pLF3, respectively. pLFlO was further derived from pLF2 by digestion with Pyu 
II followed by religation. pLFll was derived from pLFlO by Eco RI digestion. 
Similary, pLF4 was derived from pLF3 by Hirudin digestion, and pLF9 from 
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Figure 1. Construction of hybrid plasmids carrying the ompF gene. The heavy 
solid lines represent E_. coli chromosomal DNA and the open segments represent 
X phage DNA. pBR322 DNA is shown as thin lines. The position of the ompF 
gene is indicated by a line with ■ (initiation site) and/or ►(termination 
site). S, Sal li B, Bam HI; E, Eco RI; H, Hin dill; P, Pvu II; Ap r , ampicilin 
resistant; Tc r , tetracycline resistant; ori, origin of replication. In the 
X ompFl DNA, only Bam HI and Sal I sites are shown. 



pLF4 by Pvu II digestion. The 3.7-kb Sal I-fYJt II fragment involved in pLFlO 
was called Fragment 2 in a previous paper (4). 
Restriction Endonuclease Mapping and DNA Sequencing . 

Figure 2 shows a restriction endonuclease cleavage map around the ompF 
gene. It was incorrectly reported in a previous paper that the region derived 
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Figure 2. Restriction endonuclease cleavage sites and sequence strategy 
around the ompF gene. Tbe singly labeled fragments are indicated by arrows. 
(#) The position of the 32 P-label at 5'-end. Broken regions of the arrows 
indicate that the sequence of these regions were not determined in these 
experiments. The coding region of the ompF gene is also indicated with ■ 
(initiation site) and ►(termination siteTT" 

from the g. coli chromosome shown in Fig. 1 has one Bam HI site(15). However, 
no Bam HI site was found to exist in this region. The nucleotide sequence 
shown in Fig. 3 also confirmed this fact. We showed previously (4) that the 
OmpF signal peptide is coded for by the 180-bp Pst I DNA fragment and the 
direction of transcription is from left to right in Fig. 2. In order to de- 
termine the entire ompF gene we sequenced the entire 1.8-kb region shown in 
Fig. 2. The restriction endonuclease fragments used for the sequence analyses 
are also shown in Fig. 2. 

The nucleotide sequence of 1807 bp covering the entire region shown in 
Fig. 2 was determined (Fig. 3). The determination was carried out for both 
strands of most of the DNA. In addition, all restriction sites were over- 
lapped by sequence determinations of different DNA fragments. Therefore, no 
sequence information was missed as a result of the loss of a very small re- 
striction fragment. Following the sequence that covers the promoter, signal 
peptide and NHg-terminal regions of the OmpF protein (4), there was an open 
reading frame sufficiently long to encode the OmpF protein. This is the only 
long open reading frame that exists in this region. This reading frame is 
terminated by the two contiguous termination codons (nucleotides 1542-1547), 
thus making a protein of a molecular weight of 37,082 (excluding the signal 
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AAAACTMTCCGCAnCnATTGCGCAnAGnmT CnAGCTMTAGUCMnnCATACTATnTnGGCATTCTGeATGTCTGAAAGAAGATTT^^ 
GTGCA6GTCGATAAA6TnCCATCAGAAACAAAAnTCCGTnA6TTAAT^ 

MCGTCTCTATGGAAATATGACGGTGnCACAAAGnCCnAAAnnACTTTTGGnAWTATTTnTCTTTTTGAAACCAAAT 

ACTTTCACGGTAGCGAAACGnAGTTTGMTGGAAAGATGCCTGMGACACATAAAGACACCAAACTCTCATCAATAGTTCCGTAAATTTTTATTGACAG^ 

AACnATTGACGGCAGTGGCAGGTGTCATAAAAAAAACCATGAG6GTAATAAATMTGATGAAGCGC^ 

MetMetLysArgAanlleLeuAlaVallleValProAlaLeuLeu 

GTAGMGGTKTGCAAACGCTGCAGAAATCTATM^ 

ValAlaGlyThrAlaAsnAlaAlaGluIleTyrAanLysAspOlyAsnLysValAspLcuTyrGlyLysAlaValGlyLeuHisTyrPheSerLysGlyAsn 

iAMCTCAAATCAAnCCGATCTGACCGGTTATGGTCA 700 

HuThrGlnlleAsnSerAspLeuThrGlyTyrGlyGln 

GTGGGAATATAACTTCCAGGGTAACAACTCTGAAGGCGCTGACGCTCAAACTGGTAACAAAACG^ 

TrpGluTyrAsnPheGlnGlyAsnAsnSerGXuGlyAlaAspAlaGlnThrGlyAsnLysThrArgLeuAlaPhcAlaGlyLeuLysTyrAlaAspVal 

GGTTCTnCGATTACGGCCGTAACTACGGTGTGGmATGATGCACTMGnACACCGATATGCTGCCAGMn^ 

GlySerPheAapTyi^lyA^AanTyrGlyValValTyrAapAlal^uGlyTyrThrAspMetLeuProGluPheGlyGlyAspThrAlaTyrSerABpAap 
AC phePhevnGlJto^IlGlyGl5vIlAllTh 



ACAGGACTTTGGTCCTGTTTTTTTTA 

•1700 

TACCTTCCAGAGCAATCTCACGTCTTGCAAAAACAGCCTGCGTTTTCATCAGTAATAGTTGGAATTTTGTAAATCTCCCGTTACCCTGATAGCGGACTTC 

CCnCTGTAACCATAATGGAACCTCGTCATGnTGAGAACATTACCGCCGCTCCTGCCGACCCGATTCTGGGCCTGGCCGATCTGTnCGTGCCGATGAA 1800 

CGTCCCG 807 

Figure 3. The 1807 base pairs DNA sequence encompassing the ompF gene. The 
amino acid sequence of the pro-OmpF protein deduced from the DNA sequence is 
also shown. The cleavage site of the signal peptide is shown by a triangle. 
The amino acid residues are numbered from the amino terminus of the OmpF 
protein. Possible transcription termination signals are underlined. 

peptide). This molecular weight is in good agreement with that estimated 
from the gel electrophoretic behavior of the OmpF protein (MW= 38,000) (5). 
In Fig. 3 is also shown the amino acid sequence deduced from the nucleotide 
sequence. The amino acid composition derived from this sequence is also in 
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excellent agreement with our earlier estimates based on amino acid analysis 
of acid hydrolyzates of the whole OmpF protein (6). In addition, the deduced 
sequence is almost the same as that of a corresponding protein of E. coli B 
determined by Chen et al . (27). Following the termination codons, there is 
a possible ^- independent transcription termination signal which is charac- 
terized by an oligo(T) sequence and a stable stem-and-loop structure 
(nucleotides 1 570-1 599) (28) . This will be discussed later. From these 
results we conclude that the DNA sequenced here represents the entire ompF 
gene. 

DISCUSSION 

We previously determined the DNA sequence for the promoter-signal 
sequence region of the ompF gene and presented the amino acid sequence of the 
signal peptide (4). Here we present the DNA sequence of 1807 bp that en- 
compassing the entire ompF gene. Its unique features are discussed in the 
following sections. 
Structure of the ompF gene . 

The DNA sequence shown in Fig. 3 together with previous results (4, 29) 
indicates that the OmpF protein is first synthesized as a precursor form that 
consists of 362 amino acid residues starting with initiation codon ATG 
(nucleotides 456-458) and terminating with termination codon TAA (nucleotides 
1542-1544). The precursor is converted to the OmpF protein having 340 amino 
acid residues by release of the Nf^-terminal signal peptide (22 amino acid 
residues). It is interesting that the ompF gene has both the initiation 
codons (nucleotides 456-461) and the termination codons (nucleotides 1542- 
1547) in tandem fashion. No potential open reading frame that could code for 
other proteins can be found in this locus. 

The coding region of the ompF gene is preceded by candidates for the 
Shine-Dai garno sequence, Pribnow box and the RNA polymerase recognition site 
as discussed in a previous paper(4). The initiation site of the transcription 
is being studied in this laboratory in relation to the mechanism of the ompF 
expression that is regulated by osmolarity of the culture media. The DNA 
sequence also shows the possible P~independent transcription termination 
signal (nucleotides 1570-1599) that can form an extremely stable stem-and- 
loop structure with an oligo(T) sequence at the end (28). A similar structure 
has been found at the corresponding region of the lpp and ompA genes, the 
genes for other major outer membrane proteins (30, 31, 32). A similar stem- 
and-loop structure also is located preceding the ompF gene(nucleotides 1-37). 
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The asnS gene is located right upstream of the ompF gene (N. Mutoh, Y. Koga, 
K. Inokuchi and S. Mizushima, manuscript in preparation). Therefore, this 
region may represent the transcription termination signal of the asnS gene. 

Figure 4 shows the distribution of AT base pairs around the ompF gene. 
It is apparent that the first 455 base pairs that cover the promoter region 
of the ompF gene have a significantly high AT content (69%) as in the corre- 
sponding region of the major lipoprotein gene, l£P_ [80%) (33), in comparison 
with the average AT content (49%) of the E. coli chromosomal DNA (34). Many 
A or T clusters (9-15 contiguous nucleotides) are found in this region as in 
the case of the l_p_p_ gene. On the other hand, the AT content of the promoter 
region of the ompA gene is not so high (56%)(31). The high AT content in the 
promoter region possibly contributes to destabilization of the double helical 
structure of the DNA and facilitates RNA polymerase-mediated strand-unwinding 
(35, 36). The AT content decreased to 52% in the coding region (nucleotides 
456-1541), and again increased to 67% in the region probably representing the 
3'-terminus of the mRNA (nucleotides 1542-1599). 
Codon Usage . 

The codon usage in the ompF gene is highly nonrandom (Table 1). A 
similar feature was also found in genes for other major outer membrane 
proteins (30, 31, 37), ribosomal proteins (for example 38, 39) and others 
(40, 41) that are efficiently synthesized in g. coli cells. The codon usage 
in ompA , Ipp and lamB genes are also shown in Table 1 for comparison. It was 
suggested that the codons preferentially utilized in genes conducting 
efficient protein synthesis are those recognized by the most abundant species 
of isoaccepting tRNAs (42, 43). It was further suggested for these genes 
that a preferentially utilized codon among synonymous codons that are recog- 
nized by the same tRNA has a preference in a codon-anti codon interaction (42, 



Figure 4. Distribution of AT base 
pairs around the ompF gene. 
The ompF gene region is divided into 
the following four sections: the 
region involving the promoter 
(1-455), the coding region (456-1541 ) , 
the region probably representing 
the 3" -terminus of the mRNA (1542 
-1599), and the subsequent region 
(1600-1807). The average AT content 
of E. coli chromosomal DNA (49%) 
(34)" is indicated by a broken line. 
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Table 1. Codon Usage in the Precursors of the Major Outer Membrane Proteins 
Second base 



(a)(b)(c)(d) 



(oMbHcHd) 



(a)(b)(c)(d) 



Phe 7 2 0 1 

12 7 0 16 

2 10 1 

0 0 0 2 



Ser 6 4 3 7 



Tyr 9 2 0 7 
20 15 1 15 
Term 1 1 1 1 



Cys 0 10 2 

0 110 

Term - - - - 

Trp 2 5 0 19 



Leu 3 0 0 2 

0 0 0 3 

0 0 0 0 

18 22 9 11 



Pro 2 10 3 



HIS 12 0 3 



Gin 6 2 0 6 



Arg 10 10 3 11 



He 110 7 

13 15 2 13 

0 0 0 0 

Met 5 6 3 16 



0 0 0 0 



Val 16 17 3 10 



Ala 16 22 8 12 
2 1 0 10 
11 11 3 12 



Asp 13 5 2 14 
14 17 6 19 
Glu 13 10 0 18 



Gly 33 24 2 26 

15 14 1 22 

0 0 0 0 

10 0 1 



Precursor of the OmpF protein. , „ 

Precursor of the OmpA protein(31,32) . 

(c) Precursor of the llpoproteln(30) . 

(d Precursor of the LamB proteln(37). 



43). The nonrandom codon usage in the ompF , ompA and lj>p_ genes meets these 
requirements: For example, preferential utilization of CUG for leucine, GGU/ 
GGC for glycine and AUC/AUU for isoleucine must contribute to the efficient 
translation of the ompF gene through the recognition by the abundant species 
of isoaccepting tRNA, and preferential utilization of AAA over AAG for 
lysine, GAA over GAG for glutamic acid, AUC over AUU for isoleucine and AAC 
over AAU for asparagine must contribute to the efficient translation through 
the proper codon-anticodon interaction. Although the LamB protein becomes a 
major outer membrane protein when the synthesis is induced by maltose, the 
nonrandomness of the codon usage in the lamB gene is not as extreme as that 
in genes for major outer membrane proteins. 
Protein Structure . 

E.. coli K-12 possesses two matrix proteins, OmpF and OmpC, while the B 
strain has one such protein that migrates to the same position as the K-12 
OmpF protein on polyacryl amide gel (44). The primary structure of the OmpF 
protein deduced from the DNA sequence (Fig. 3) is almost the same as that of 
the corresponding protein of the B strain reported by Chen et al. (27) except 
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Figure 5. Expected secondary structure of the OmpF protein. Two expected 
structures predicted by the methods of Chou & Fasman (45) and that of Nagano 
(46) are shown. (M££ )tf -he! ix; (MA) P -sheet; (11) p-turn (Chou & Fasman) 

or (3 -turn and loop (Nagano); ( ) coil. In cases of ambiguity in the 

prediction both conformational states are indicated. The histograms show the 
sum of the individual conformational states predicted by the two methods for 
rf-helix and (3-sheet. The values were calculated as follows: 1 was given to 
the region expected to be one of the relevant conformational states by one 
method, and 0.5 was given to the region in which two conformational states 
were expected by one method. The regions over 1.5 are shadowed. 



for glutamine, glutamic acid and glutamine residues at positions 66, 117 and 
262, respectively. These replacements of amino acid residues can be accounted 
for by a single base change in the gene. Therefore, we conclude that the two 
proteins are essentially the same. 

The translated sequence of the OmpF protein was analyzed for the expected 
secondary structure by using the method of Chou and Fasman (45) and that of 
Nagano (46) (Fig. 5). Although the current state of the secondary structure 
prediction is not perfect and many contradictions are, indeed, found in the 
predicted structures with the two methods, it is clear that the possible 
p-sheet content is far greater than that of the oC -helix. Figure 5 also 
shows histograms in which information obtained by the two methods is incorpo- 
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rated. The region strongly supported by the two methods to be 0-sheet (larger 
than 1.5 in the histogram) is five times larger than the corresponding region 
for the o(.-helix. These results are consistent with our earlier studies with 
circular dichroism that the native OmpF protein is peculiar in that it has a 
very high content of @-sheet with a low content of ct-helix (5). It should 
be taken into consideration, however, that the OmpF protein is a transmembrane 
protein and exists as a trimer, while most of the proteins used for the 
derivation of the prediction rules are globular hydrophilic proteins. 
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ABBREVIATIONS 

bp, base pair(s); kb, kilobases or kilobase pairs; EDTA, ethylene- 
diaminetetraacetate; Tris, Tris-(hydroxymethyl)- aminomethane; MW, molecular 
weight. 
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